Predicting the host of influenza viruses based on the word vector

نویسندگان

  • Beibei Xu
  • Zhiying Tan
  • Kenli Li
  • Taijiao Jiang
  • Yousong Peng
چکیده

Newly emerging influenza viruses continue to threaten public health. A rapid determination of the host range of newly discovered influenza viruses would assist in early assessment of their risk. Here, we attempted to predict the host of influenza viruses using the Support Vector Machine (SVM) classifier based on the word vector, a new representation and feature extraction method for biological sequences. The results show that the length of the word within the word vector, the sequence type (DNA or protein) and the species from which the sequences were derived for generating the word vector all influence the performance of models in predicting the host of influenza viruses. In nearly all cases, the models built on the surface proteins hemagglutinin (HA) and neuraminidase (NA) (or their genes) produced better results than internal influenza proteins (or their genes). The best performance was achieved when the model was built on the HA gene based on word vectors (words of three-letters long) generated from DNA sequences of the influenza virus. This results in accuracies of 99.7% for avian, 96.9% for human and 90.6% for swine influenza viruses. Compared to the method of sequence homology best-hit searches using the Basic Local Alignment Search Tool (BLAST), the word vector-based models still need further improvements in predicting the host of influenza A viruses.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Caspase Cleavage Motifs of Influenza Subtypes Proteins: Alternations May Switch Viral Pathogenicity

Background and Aims: The caspases are unique proteases that mediate the host cell apoptosis during viral infection. In this study, we identified the caspase cleavage motifs of H5N1 and H9N2 influenza viruses isolated during 1998-2012. Materials and Methods: Amino acid sequences of the eleven proteins encoded by the viruses as the caspase substrates downloaded from NCBI. The caspase cleavage mot...

متن کامل

Bacillus subtilis as a Host for Recombinant Hemagglutinin Production of the Influenza A (H5N1) Virus

Abstract Background and Aims: Influenza A(H5N1) viruses  circulating in animals might evolve and acquire the ability to spread from  human to human and thus start a pandemic. Hemagglutinin (HA) has been shown to play a major role in binding of influenza virus to its target cell and the main neutralizing antibody responses elicit against this region. Recent studies have shown that...

متن کامل

Prokaryotic Expression of Influenza A virus Nucleoprotein Fused to Mycobacterial Heat Shock Protein70

Background and Aims: The novel approaches in influenza vaccination have targeted more conserved viral proteins such as nucleoprotein (NP) to provide cross protection against all serotypes of influenza A viruses. Influenza specific cytotoxic T lymphocytes (CTL) are able to lyse influenza-infected cells by recognition of NP, the major target molecule in virus for CTL responses. On the other hand,...

متن کامل

Genetic and phylogenetic analysis of the ribonucleoprotein complex genes of H9N2 avian influenza viruses isolated from commercial poultry in Iran

BACKGROUND: The H9N2 subtype of avian influenzaviruses (AIVs) has been isolated in multiple avian species inmany European, Asian, African and American countries. Sincethe first outbreak of H9N2 virus in Iran in 1998, this virus haswidely circulated throughout the country, resulting in majoreconomic losses in chicken flocks. Several amino acids in thevirus ribonucleoprotein (RNP) complex includi...

متن کامل

Molecular Identification of Pre-Existing Immunityin Human against H9N2 Influenza Viruses Using HLA-A*0201 Binding Peptides

Background and Aims: The contribution genetic and antigenic diversity of H9N2 influenza viruses in evading from immune responses, cytotoxic T lymphocytes (CTL) epitopes in hemagglutinin (HA) protein restricted by HLA binding peptides was identified. Materials and Methods: Phylogenetic analyses were carried out for all of full length HA and deduced amino acid sequences of H9N2 viruses available ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2017